首页> 外文OA文献 >OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents
【2h】

OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents

机译:OrganismTagger:生物医学文档中生物实体的检测,归一化和接地

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Motivation: Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. Results: We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. Availability: The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. Contact: witte@semanticsoftware.info
机译:动机:全文文章中提及的生物的语义标记是文献挖掘和语义丰富解决方案的重要组成部分。标记生物的提及在消除文本中其他实体(例如蛋白质)的歧义上也起着关键作用。高精度的生物标签系统必须能够检测多种形式的生物提及,包括通用名称以及传统分类学类别:属,种和菌株。另外,这样的系统必须解析缩写和首字母缩写词,分配科学名称,并在可能的情况下将检测到的提及链接到NCBI分类标准数据库,以进行进一步的语义查询和文献导航。结果:我们提出了OrganismTagger,这是一个基于规则的混合/机器学习系统,可以从文献中提取生物体信息。它包括用于从NCBI分类数据库的副本自动生成词汇和本体资源的工具,从而有助于最终用户进行系统更新。其新颖的基于本体的资源也可以在其他语义挖掘和链接数据任务中重用。通过解析首字母缩写词和缩写,将每个检测到的有机体提及均归一化为规范名称,随后以NCBI分类标准数据库ID为基础。特别是,我们的系统将新颖的机器学习方法与基于规则的词汇方法相结合,用于检测文档中的应变提及。在我们的手动注释OT语料库上,OrganismTagger的准确度达到95%,召回率达到94%,接地准确率达到97.5%。在Linnaeus-100的手动注释语料库上,结果显示精度为99%,召回率为97%,接地精度为97.4%。可用性:OrganismTagger,包括支持工具,资源,培训数据和手册注释,以及最终用户和开发人员文档,可在http://www.semanticsoftware.info/organism-tagger的开源许可下免费获得。联系人:witte@semanticsoftware.info

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号